Search CORE

344 research outputs found

Communication-optimal Parallel and Sequential Cholesky Decomposition

Author: Grey Ballard
Grey Ballard
James Demmel
James Demmel
Oded Schwartz
Oded Schwartz
Olga Holtz
Olga Holtz
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2009
Field of study

Numerical algorithms have two kinds of costs: arithmetic and communication, by which we mean either moving data between levels of a memory hierarchy (in the sequential case) or over a network connecting processors (in the parallel case). Communication costs often dominate arithmetic costs, so it is of interest to design algorithms minimizing communication. In this paper we first extend known lower bounds on the communication cost (both for bandwidth and for latency) of conventional (O(n^3)) matrix multiplication to Cholesky factorization, which is used for solving dense symmetric positive definite linear systems. Second, we compare the costs of various Cholesky decomposition implementations to these lower bounds and identify the algorithms and data structures that attain them. In the sequential case, we consider both the two-level and hierarchical memory models. Combined with prior results in [13, 14, 15], this gives a set of communication-optimal algorithms for O(n^3) implementations of the three basic factorizations of dense linear algebra: LU with pivoting, QR and Cholesky. But it goes beyond this prior work on sequential LU by optimizing communication for any number of levels of memory hierarchy.Comment: 29 pages, 2 tables, 6 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Minimizing Communication for Eigenproblems and the Singular Value Decomposition

Author: Ballard Grey
Demmel James
Dumitriu Ioana
Publication venue
Publication date: 01/01/2010
Field of study

Algorithms have two costs: arithmetic and communication. The latter represents the cost of moving data, either between levels of a memory hierarchy, or between processors over a network. Communication often dominates arithmetic and represents a rapidly increasing proportion of the total cost, so we seek algorithms that minimize communication. In \cite{BDHS10} lower bounds were presented on the amount of communication required for essentially all

O(n^3)

-like algorithms for linear algebra, including eigenvalue problems and the SVD. Conventional algorithms, including those currently implemented in (Sca)LAPACK, perform asymptotically more communication than these lower bounds require. In this paper we present parallel and sequential eigenvalue algorithms (for pencils, nonsymmetric matrices, and symmetric matrices) and SVD algorithms that do attain these lower bounds, and analyze their convergence and communication costs.Comment: 43 pages, 11 figure

arXiv.org e-Print Archive

CiteSeerX

Toward accurate polynomial evaluation in rounded arithmetic

Author: Demmel James
Dumitriu Ioana
Holtz Olga
Publication venue
Publication date: 01/01/2005
Field of study

Given a multivariate real (or complex) polynomial

p

and a domain

\cal D

, we would like to decide whether an algorithm exists to evaluate

p(x)

accurately for all

x \in {\cal D}

using rounded real (or complex) arithmetic. Here ``accurately'' means with relative error less than 1, i.e., with some correct leading digits. The answer depends on the model of rounded arithmetic: We assume that for any arithmetic operator

op(a,b)

, for example

a+b

a \cdot b

, its computed value is

op(a,b) \cdot (1 + \delta)

, where

| \delta |

is bounded by some constant

\epsilon

where

0 < \epsilon \ll 1

, but

\delta

is otherwise arbitrary. This model is the traditional one used to analyze the accuracy of floating point algorithms.Our ultimate goal is to establish a decision procedure that, for any

p

and

\cal D

, either exhibits an accurate algorithm or proves that none exists. In contrast to the case where numbers are stored and manipulated as finite bit strings (e.g., as floating point numbers or rational numbers) we show that some polynomials

p

are impossible to evaluate accurately. The existence of an accurate algorithm will depend not just on

p

and

\cal D

, but on which arithmetic operators and which constants are are available and whether branching is permitted. Toward this goal, we present necessary conditions on

p

for it to be accurately evaluable on open real or complex domains

{\cal D}

. We also give sufficient conditions, and describe progress toward a complete decision procedure. We do present a complete decision procedure for homogeneous polynomials

p

with integer coefficients, {\cal D} = \C^n, and using only the arithmetic operations

+

-

and

\cdot

.Comment: 54 pages, 6 figures; refereed version; to appear in Foundations of Computational Mathematics: Santander 2005, Cambridge University Press, March 200

arXiv.org e-Print Archive

CiteSeerX

Dagstuhl Research Online Publication Server

Accurate and Efficient Expression Evaluation and Linear Algebra

Author: Aho
Demmel
Demmel
Hong
Ioana Dumitriu
James Demmel
Karlin
Macdonald
Martínez
Miller
Olga Holtz
Parlett
Peña
Plamen Koev
Reznick
Tarski
Taylor
Ye
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2007
Field of study

We survey and unify recent results on the existence of accurate algorithms for evaluating multivariate polynomials, and more generally for accurate numerical linear algebra with structured matrices. By "accurate" we mean that the computed answer has relative error less than 1, i.e., has some correct leading digits. We also address efficiency, by which we mean algorithms that run in polynomial time in the size of the input. Our results will depend strongly on the model of arithmetic: Most of our results will use the so-called Traditional Model (TM). We give a set of necessary and sufficient conditions to decide whether a high accuracy algorithm exists in the TM, and describe progress toward a decision procedure that will take any problem and provide either a high accuracy algorithm or a proof that none exists. When no accurate algorithm exists in the TM, it is natural to extend the set of available accurate operations by a library of additional operations, such as

x+y+z

, dot products, or indeed any enumerable set which could then be used to build further accurate algorithms. We show how our accurate algorithms and decision procedure for finding them extend to this case. Finally, we address other models of arithmetic, and the relationship between (im)possibility in the TM and (in)efficient algorithms operating on numbers represented as bit strings.Comment: 49 pages, 6 figures, 1 tabl

arXiv.org e-Print Archive

CiteSeerX

Crossref

Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures

Author: Benson Austin R.
Demmel James
Gleich David F.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/01/2013
Field of study

The QR factorization and the SVD are two fundamental matrix decompositions with applications throughout scientific computing and data analysis. For matrices with many more rows than columns, so-called "tall-and-skinny matrices," there is a numerically stable, efficient, communication-avoiding algorithm for computing the QR factorization. It has been used in traditional high performance computing and grid computing environments. For MapReduce environments, existing methods to compute the QR decomposition use a numerically unstable approach that relies on indirectly computing the Q factor. In the best case, these methods require only two passes over the data. In this paper, we describe how to compute a stable tall-and-skinny QR factorization on a MapReduce architecture in only slightly more than 2 passes over the data. We can compute the SVD with only a small change and no difference in performance. We present a performance comparison between our new direct TSQR method, a standard unstable implementation for MapReduce (Cholesky QR), and the classic stable algorithm implemented for MapReduce (Householder QR). We find that our new stable method has a large performance advantage over the Householder QR method. This holds both in a theoretical performance model as well as in an actual implementation

arXiv.org e-Print Archive

Crossref

Computing stable eigendecompositions of matrices

Author: Demmel James Weldon
Publication venue: Published by Elsevier Inc.
Publication date: 31/07/1986
Field of study

AbstractIf a matrix T is known only to within a tolerance ϵ (because of measurement or roundoff errors), then it may be difficult to compute an eigendecomposition of T, since its invariant subspaces are discontinuous functions of its entries. In this paper we show how to compute a stable decomposition of an uncertain matrix T which varies continuously and boundedly as T varies in a ball of radius ϵ

Elsevier - Publisher Connector

LU factorization with panel rank revealing pivoting and its communication avoiding version

Author: Demmel James W.
Grigori Laura
Gu Ming
Khabou Amal
Publication venue
Publication date: 01/01/2012
Field of study

We present the LU decomposition with panel rank revealing pivoting (LU_PRRP), an LU factorization algorithm based on strong rank revealing QR panel factorization. LU_PRRP is more stable than Gaussian elimination with partial pivoting (GEPP). Our extensive numerical experiments show that the new factorization scheme is as numerically stable as GEPP in practice, but it is more resistant to pathological cases and easily solves the Wilkinson matrix and the Foster matrix. We also present CALU_PRRP, a communication avoiding version of LU_PRRP that minimizes communication. CALU_PRRP is based on tournament pivoting, with the selection of the pivots at each step of the tournament being performed via strong rank revealing QR factorization. CALU_PRRP is more stable than CALU, the communication avoiding version of GEPP. CALU_PRRP is also more stable in practice and is resistant to pathological cases on which GEPP and CALU fail.Comment: No. RR-7867 (2012

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

HAL - Lille 3

INRIA a CCSD electronic archive server

MIMS EPrints

Hal-Diderot

HAL-Rennes 1